library(mixOmics)
library(tidyverse)
Feature level integration allows integrated analyses of same features measured from different samples on the same outcome variable. The advantage of this method is that features donot have be from the same samples. This approach has an advantage for this study considering that the full data-set cant be used if the phenotype annotation is available.
Here we uses “mixOmics” package approach in an integrated analyses of the protein data from 3 timepoints in order to capture features from all three timepoints and their joint correlation with the outcome variable.
Some factors to consider
- The analyses in the study come from the same individuals at three time points and maybe confounded to repetition and over fitting is possible
- The purpose of this analyses is to extract most important features and not cross-validation
- There is no independent validation on a test set to test classification
To select the PCs capturing most of the variation on the phenotype variable, an iterative search to select the optimal number of PCs and features > > associated with those PCs was performed. The selected features can then be used assess their ability to classify the phenotype variables.
In the first step we tune the number of PCs that capture most of the variation on the outcome variable. As in the following examples, The first PC captures most of the biological variation.
plot(mint_res$perf.radc.pcs, col = color.mixo(5:7))
Fig 1: Number of PCs capturing maximum variation in the data
After selection of PCs, an iterative selection using the leave one out procedure, is performed to select for optimal number of features. These features are then used to assess the classification error.
mint_res <- list()
mint_res$mint_rand = mint.plsda(X = assays(data_all_f)$loess %>% t,
Y = as.factor(data_all_f$randomisation_code),
study = data_all_f$time, ncomp = 5)
par(mar = c(4, 4, .1, .1))
plotIndiv(mint_res$mint_rand, legend = TRUE, title = 'Mint splsda: Temperature randomization',
subtitle = 'Full data', ellipse = T)
plotIndiv(mint_res$rand.splsda.res, study = 'global', legend = TRUE,
subtitle = 'Selected features', ellipse=T)
Fig 2: Individual plots indicating sample grouping before (left) and after feature selection (right)
plotIndiv(mint_res$rand.splsda.res, study = 'all.partial', title = 'MINT sPLS-DA',
subtitle = c("24h", "48h", "72h"))
par(mar = c(4, 4, 4, 4))
auroc(mint_res$rand.splsda.res )
## $Comp2
## AUC p-value
## 0 vs 1 0.773 1.866e-11
auroc(mint_res$rand.splsda.res, roc.study = "-24-" )
## $Comp2
## AUC p-value
## 0 vs 1 0.8378 6.863e-07
auroc(mint_res$rand.splsda.res, roc.study = "-48-" )
## $Comp2
## AUC p-value
## 0 vs 1 0.9502 1.422e-10
auroc(mint_res$rand.splsda.res, roc.study = "-72-" )
## $Comp2
## AUC p-value
## 0 vs 1 0.9108 3.571e-08
The correlation circle plot shows the relation ship between selected variables from each of the time points on separation at two dimensional plot.
plotVar(mint_res$rand.splsda.res)
plotLoadings(mint_res$rand.splsda.res, study = "all.partial")
par(mar = c(4, 4, .1, .1))
plotIndiv(mint_res$mint_cpc, legend = TRUE, title = 'Mint splsda: CPC score (dichotamised)',
subtitle = 'Full data', ellipse = T)
plotIndiv(mint_res$cpc.splsda.res, study = 'global', legend = TRUE,
subtitle = 'Selected features', ellipse=T)
Fig 2: Individual plots indicating sample grouping before (left) and after feature selection (right)
plotIndiv(mint_res$cpc.splsda.res, study = 'all.partial', title = 'MINT sPLS-DA',
subtitle = c("24h", "48h", "72h"))
par(mar = c(4, 4, 4, 4))
auroc(mint_res$cpc.splsda.res )
## $Comp2
## AUC p-value
## 0 vs 1 0.9007 0
auroc(mint_res$cpc.splsda.res, roc.study = "-24-" )
## $Comp2
## AUC p-value
## 0 vs 1 0.8814 3.463e-08
auroc(mint_res$cpc.splsda.res, roc.study = "-48-" )
## $Comp2
## AUC p-value
## 0 vs 1 0.9299 1.139e-09
auroc(mint_res$cpc.splsda.res, roc.study = "-72-" )
## $Comp2
## AUC p-value
## 0 vs 1 0.9513 1.587e-09
### Correlation circle plot
The correlation circle plot shows the relation ship between selected variables from each of the time points on separation at two dimensional plot.
plotVar(mint_res$cpc.splsda.res)
plotLoadings(mint_res$cpc.splsda.res, study = "all.partial")
par(mar = c(4, 4, .1, .1))
plotIndiv(mint_res$mint_shock, legend = TRUE, title = 'Mint splsda: shockable vs. non shockable',
subtitle = 'Full data', ellipse = T)
plotIndiv(mint_res$shock.splsda.res, study = 'global', legend = TRUE,
subtitle = 'Selected features', ellipse=T)
Fig 2: Individual plots indicating sample grouping before (left) and after feature selection (right)
plotIndiv(mint_res$shock.splsda.res, study = 'all.partial', title = 'MINT sPLS-DA',
subtitle = c("24h", "48h", "72h"))
par(mar = c(4, 4, 4, 4))
auroc(mint_res$shock.splsda.res )
## $Comp2
## AUC p-value
## 0 vs 1 0.6607 0.0008015
auroc(mint_res$shock.splsda.res, roc.study = "-24-" )
## $Comp2
## AUC p-value
## 0 vs 1 0.961 1.728e-09
auroc(mint_res$shock.splsda.res, roc.study = "-48-" )
## $Comp2
## AUC p-value
## 0 vs 1 0.9205 2.415e-07
auroc(mint_res$shock.splsda.res, roc.study = "-72-" )
## $Comp2
## AUC p-value
## 0 vs 1 0.9048 4.109e-05
The correlation circle plot shows the relation ship between selected variables from each of the time points on separation at two dimensional plot.
plotVar(mint_res$shock.splsda.res)
plotLoadings(mint_res$shock.splsda.res, study = "all.partial")
1 + 1
## [1] 2
knitr::knit_exit()